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Abstract 

Background: Species complexes or aggregates consist of a set of closely related species often of different ploidy 
levels, whose relationships are difficult to reconstruct. The N Hemisphere Achillea millefolium aggregate exhibits 
complex morphological and genetic variation and a broad ecological amplitude. To understand its evolutionary 
history, we study sequence variation at two nuclear genes and three plastid loci across the natural distribution of 
this species complex and compare the patterns of such variations to the species tree inferred earlier from AFLP 
data. 

Results: Among the diploid species of A millefolium agg., gene trees of the two nuclear loci, ncpGS and SBP, and 
the combined plastid fragments are incongruent with each other and with the AFLP tree likely due to incomplete 
lineage sorting or secondary introgression. In spite of the large distributional range, no isolation by distance is 
found. Furthermore, there is evidence for intragenic recombination in the ncpGS gene. An analysis using a 
probabilistic model for population demographic history indicates large ancestral effective population sizes and 
short intervals between speciation events. Such a scenario explains the incongruence of the gene trees and 
species tree we observe. The relationships are particularly complex in the polyploid members of A millefolium agg. 

Conclusions: The present study indicates that the diploid members of A millefolium agg. share a large part of their 
molecular genetic variation. The findings of little lineage sorting and lack of isolation by distance is likely due to 
short intervals between speciation events and close proximity of ancestral populations. While previous AFLP data 
provide species trees congruent with earlier morphological classification and phylogeographic considerations, the 
present sequence data are not suited to recover the relationships of diploid species in A. millefolium agg. For the 
polyploid taxa many hybrid links and introgression from the diploids are suggested. 



Background 

Species complexes or aggregates consist of a set of closely 
related species often of different ploidy levels, whose rela- 
tionships are difficult to reconstruct. Such species com- 
plexes are common in angiosperms [1-5]. Rapid genetic, 
phenotypic and ecological differentiation on one hand, 
and hybridization/polyploidy on the other, play important 
roles in their evolutionary bursts [6-9]. The temperate N 
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Hemisphere common yarrow taxa form such a complex, 
i.e., the Achillea millefolium aggregate. Centered in SE 
Europe and SW to C Asia, its diploid species are limited 
to Eurasia, whereas the polyploids have spread throughout 
the N Hemisphere [10,11]. In N America, the 4x and 6x 
cytotypes form a complex of ecological races adapted to 
many different niches with marked genotypic diversifica- 
tion [6,12,13]. By cultivation in experimental gardens, 
Clausen et al, [6] documented local adaptation of A. mille- 
folium populations to environments along an altitudinal 
transect in California from sea level to alpine regions. This 
has become a classic example of rapid adaptive divergence 
of plant populations [12-18]. 
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Our earlier AFLP data have characterized the Achillea 
millefolium complex as a clade [10]. The inferred species 
relationships of the diploid members conformed to a bin- 
ary bifurcating tree and generally agreed with traditional 
species delimitations and taxonomic arrangements. The 
polyploid members appeared to be polytomic and poly- 
phyletic although geographic patterns can be recognized 
[10,11]. Available data show that frequent exchange of 
genetic materials have been involved in the origins of 
many polyploidy taxa. Meanwhile, it seems also to have 
occurred during the divergence of the diploid species 
[10,11,19]. So far, we are still uncertain about the demo- 
graphic history of A, millefolium agg. and the progenitors 
of many polyploid taxa. 

Due to their dominant nature, AFLP markers are diffi- 
cult to use for inferring genetic parameters of populations, 
particularly ancestral population sizes, split times, and 
migration rates. In principle, this can be accomplished bet- 
ter with DNA sequences either from organelles or from 
the nuclear genome [20,21]. Yet, we still meet challenges 
in practice: Plastid DNA variation is often too low to infer 
relevant gene trees with confidence. In addition, despite 
the lower effective population size of plastids, plastid gene 
trees may still not reflect the species tree due to incom- 
plete lineage sorting. With nuclear genes, frequent birth 
and death of gene copies, less lineage sorting due to higher 
effective population size, secondary introgression among 
split species as well as intragenic recombination tend to 
hamper interpretation of the patterns of polymorphism 
[22-25]. 

Here, we survey DNA sequence variation sampled at 
two nuclear loci and three plastid fragments from popula- 
tions across diploid, tetraploid, and hexaploid species 
throughout the natural distribution range of A. millefolium 
agg. and from three diploid congeneric species outside the 
aggregate. On the diploid level, we infer the demographic 
history among species and populations using the newly 
generated DNA sequence data in comparison to the rela- 
tionships inferred from the previous AFLP data. To this 
end, we also apply a probabilistic model (IMa2) [26,27] to 
three widespread diploid species to shed light on the key 
parameters, i.e., ancestral effective population sizes, time 
of speciation, and rates of gene flow. For the polyploid 
populations, we try to untangle their polytomic and poly- 
phyletic relationships which are probably complicated by 
gene flow on the same and between different ploidy levels 
using the co-dominant single-gene haplotype data. 

Methods 

Plant sampling 

We sampled thirty populations of seven diploid, seven 
tetraploid and four hexaploid taxa or cytotypes of the 
Achillea millefolium aggregate throughout the temperate 
N Hemisphere (Table 1). On average, two to three 



individuals from each population were analyzed. Broadly 
the same individuals were sequenced for the two nuclear 
loci and three cpDNA fragments; minor exceptions were 
due to repeated failures in sequencing a certain locus 
from a certain individual (see Table 1). Three diploid 
species, taxonomically outside the A. millefolium aggre- 
gate but included in previous AFLP analyses [10], were 
also sampled for this study. They are the W-Eurasian 
A. nobiliS'2x, the C-Mediterranean A. ligustica'2x and 
the East Asian A. acuminata'2x (Table 1). In addition, 
sequences of two cpDNA regions, trnH-psbA and trnC- 
ycf6, of 43 North American populations available from 
the NCBI data base (GenBank accessions EU128982- 
EUl 29456) [12] are incorporated into our plastid haplo- 
type network analysis. 

To check ploidy levels of the populations studied, two 
methods, either chromosome counting or DNA ploidy 
level determination, were applied using young flower buds 
or fresh or silica gel-dried leaves, respectively. Young 
flower buds were collected in the field and fixed in Car- 
noy s fluid (ethanobacetic acid = 3:1). To count the chro- 
mosome number, fixed flower buds were stained and 
squashed in 4% acetocarmine and observed under the 
microscope. DNA ploidy levels were investigated with pro- 
pidium iodide flow cytometry [28,29] from the prepared 
leaves. Information for ploidy levels obtained with the 
above two methods are marked in Table 1 by c and /, 
respectively, while those inferred from previous studies [12 
and Ehrendorfer et al., unpubl. Data] are marked by /. 

Voucher specimens are deposited in the herbaria of the 
Institute of Botany (WU) and the Department of Phar- 
macognosy (HBPh), both at the University of Vienna, 
Austria, and of the College of Life Sciences, Beijing Nor- 
mal University (BNU). 

Data sampling 

Total genomic DNA was extracted from ca. 0.02 g silica 
gel desiccated leaf materials following the 2 x CTAB pro- 
tocol [30] with slight modifications: Before the normal 
extraction process, sorbitol washing buffer was used to 
remove polysaccharides in the leaf materials (add 800 (iL 
sorbitol buffer to the ground leaf powder incubate the 
sample in ice for 10 min centrifuge at 10,000 g for 10 
min at 4°C and then follow the established 2 x CTAB 
protocol). 

Two nuclear genes were sampled and partially 
sequenced for this study. They are the chloroplast- 
expressed Glutamine Synthase gene (ncpG5') and the 
Sedoheptulose-Bisphosphatase [SEP) gene. The ncpG5' 
gene has been used in many plant phylogenetic studies 
and shown to be single-copy in all angiosperm species so 
far studied [31] and especially in Achillea [19]. We 
sequenced part of its coding and noncoding regions from 
exon 7 through to exon 11. The SEP gene has been 
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studied in representative taxa of the family Asteraceae 
[32]. It was shown to be single-copy by our preUminary 
analyses in several diploid species of Achillea, Readers are 
referred to Ma et al. [19] for primers used for amplifying 
the ncpG5' locus and to Chapman et aL [32] for the SBP 
locus (its exon 5 through to 7, i.e. the locus B12 in Chap- 
man etaL 2007). 

Three noncoding chloroplast DNA regions, trnH-psbA, 
trnC-ycf6 (including partial ycf6-psbM) and rpL16 were 
sequenced. PCR reactions were conducted with universal 
primer pairs [33]. 

The amplification was carried out in a volume of 20 (iL 
with final concentration of 1 x PCR buffer, 0.05 U exTaq 
(TaKaRa, Shiga, Japan) or HiFi (TransTaq DNA polymer- 
ase High Fidelity, TransGen Biotech), 200 \iM of each 
dNTP, 1% DMSO, 0.5 |iM of each primer, and with 1 (iL 
template DNA and ddH20 added to the final volume. The 
amplification was conducted on a Peltier thermocycler 
(Bio-RAD) initiated with 5 min of pre-denaturing at 94°C 
followed by 30 cycles of 1 min at 94°C, 30s at 48-55°C, 
and 1.5 min at 72°C. A final extension was then taken at 
72°C for 15 min followed by a hold at 4°C. The PCR pro- 
ducts were electrophoresed on and excised from the 1.0% 
agarose gel in TAE buffer. They were then purified using a 
DNA Purification kit (TianGen Biotech or TransGen Bio- 
tech, Beijing, China). The purified PCR products were 
either used for direct sequencing (for the cpDNA frag- 
ments) or ligated into a pGEM-T Vector (for nuclear 
genes) with a Promega Kit (Promega Corporation, Madi- 
son, USA). For sequencing the nuclear genes, about five to 
eight positive clones from each diploid and ten to fifteen 
from each polyploid individual were randomly selected for 
sequencing. The plasmid was extracted with an Axyprep 
Kit (Axygene Biotechnology, Hangzhou, China). Cycle 
Sequencing was conducted using ABI PRISM® BigDye™ 
Terminator. The same primers used for amplification (for 
cpDNA fragment) or the vector primers T7/Sp6 (for 
nuclear genes) were applied here. The sequenced products 
were run on an ABI PRISM™ 3700 DNA Sequencer (PE 
Applied Biosystems). 

Data analyses 

Sequences were assembled with the ContigExpress pro- 
gram (Informax Inc. 2000, North Bethesda, MD), aligned 
with ClustalX 1.81, and then manually improved with 
BioEdit version 7.0.1. To prevent possible sequencing 
errors, single mutations in the nuclear gene data sets likely 
generated by the cloning sequencing method were 
excluded from the analyses. Furthermore, unique 
sequences in the nuclear gene data matrix, which do not 
fall into any majority-rule consensus sequence group 
[19,34] or show inconstant branch positions in trees based 
on different subsets of data, i.e., with partial characters or 
randomly selected sequences, during the initial analyses 



were eliminated to avoid influence of PCR-mediated 
recombination [19,35,36]. The final numbers of indivi- 
duals/clones analyzed at each locus for each population 
are listed in Table 1. All the sequences analyzed were sub- 
mitted to the NCBI GenBank under accession numbers 
HQ601971-HQ602593 (the nuclear ncpG.S and SBP 
genes) and HQ450864-HQ451071 (the plastid loci). 

The allelic data sets of the two nuclear genes, ncpG^ and 
SBP, were analyzed separately, whereas sequences of three 
cpDNA fragments were combined as one locus. 

Gaps in the nuclear data sets were treated as missing 
data, whereas each indel position (no matter how many 
nucleotide sites it contained) of the plastid data set was 
coded as a binary character (0/1 = A/C) using the program 
GapCoder [37]. 

As A, millefolium agg. consists of species with short evo- 
lutionary history [10,11], Neighbor Joining (NJ), Maximum 
Parsimony (MP) and Median-Joining network were 
applied to the present data. For the nuclear sequences. 
Neighbour Joining and Parsimony analyses were per- 
formed with MEGA 5.05 and PAUP^^ 4.0b 10a, respectively. 
All nucleotide substitutions were equally weighted. Gaps 
were treated as missing data. We first analyzed data of the 
diploid species to show diversification of the gene lineages, 
and then of all the taxa to investigate relationships among 
the polyploids and diploids within A, millefolium agg. The 
NJ analysis was conducted with Kimura s 2-parameter dis- 
tances [38] and bootstrapped with 1000 replicates. For the 
MP method, heuristic searches were performed using 
1000 random taxon addition replicates with ACCTRAN 
optimization and TBR branch swapping. Up to 10 trees 
with scores larger than 10 were saved per replicate. The 
stability of internal nodes of the MP tree was assessed by 
bootstrapping with 1000 replicates (MulTrees option in 
effect, TBR branch swapping and simple sequence 
addition). 

Median-Joining network analysis implemented in Net- 
work ver. 4.5.1.6 available at http://www.fluxus-engineer- 
ing.com/sharenet.htm [39] was applied to the cpDNA data 
set. All variable sites were equally weighted and the homo- 
plasy level parameter (s) was set to zero given that varia- 
tion rates of the closely related species is low, especially in 
their plastid DNA. 

To understand the population demography at the time 
of speciation of the diploid species of A, millefolium agg., 
we applied a probabilistic model, the Isolation with Migra- 
tion Model for multiple populations implemented in IMa2 
[27], to three widespread and closely related species A, 
asplenifolia'2x and A. setacea'2x and A. asiatica'2x. 
These species are here regarded as three diverged popula- 
tions which share nuclear sequence variation. Shared 
alleles could reflect ancestral polymorphism or gene flow 
after separation of the populations or species. Assuming 
neutrality, retention of ancestral polymorphisms is likely if 
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speciation is fast relative to drift, which is inverse in inten- 
sity to the effective population size. As a rule of thumb, 
species are well separated with little ancestral polymorph- 
ism and thus almost complete lineage sorting, if the time 
of separation is at least as long as four times the effective 
population size [40]. Secondary genetic exchange between 
the diverging species can also lead to shared alleles 
observed [41]. The multipopulation model IMa2 allows 
both ancestral polymorphism and gene flow subsequent to 
divergence. It assumes a known history of the sampled 
populations, which can be represented by a rooted bifur- 
cating tree. In earlier analysis using AFLP data [10], we 
inferred the rooted species tree as: {{A. asiatica, A. aspleni- 
folia), A. setacea). We note that Ima2 provides posterior 
distributions of parameters, such that the confidence in 
the inference of each parameter can be obtained from 
observing the spread of the posterior distribution. The IM 
model also assumes neutral genetic variation, freely 
recombining unlinked loci and no intragenic recombina- 
tion or gene conversion [42]. Sequences of the two nuclear 
loci, the ncpG.S and the SBP genes, and of three plastid 
fragments were used for this analysis. The polymorphic 
sites of the sequenced nuclear and plastid loci are mostly 
of introns or intergenic spacers and thus should fit the 
neutral variation model. Using the four-gamete criterion 
[43], we do not find intragenic recombination in the 
nuclear sequences among these three species. The data of 
the three plastid fragments were combined because the 
chloroplast genes are generally linked and no evidence of 
recombination between the three regions is found. 

To run IMa2, one random haplotype per plant indivi- 
dual was chosen for the nuclear gene data sets, and the 
plastid data set was composed of sequences from the 
same plant individuals. This avoids bias but decreases 
the amount of information and thus leads to broader 
posterior distributions. The IS (Infinite Sites) model [44] 
of sequence evolution was chosen for the plastid locus, 
whereas, the HKY model [45] which allows for multiple 
substitutions was selected for the two nuclear loci 
because double mutations were found for a few poly- 
morphic sites at both loci. The inheritance scalar was 
set to 1.0 for the nuclear and to 0.25 for the plastid loci, 
respectively. 

To set upper bounds on the prior distributions of the 
parameters, we estimated for each of the three species 
the geometric means of the population mutation rate 
4Nu across all three loci using Watterson's estimator 0 
(per sequence not per site). The largest mean value was 
found with A. asiatica'2x. (an estimate of 4Nu = 9.8205), 
and this was used to set the upper bound on uniform 
prior for each of the three population demographic para- 
meters: population size {9 = 4Nu), splitting time {t = Tu, 
where T is the time in generations since the common 
ancestry, and it is of the same order of 4A/) and migration 



rates {2NM = 4NU x m/2). The priors were finally set as 
follows: the upper bound of population sizes q = 100, 
splitting times t = 5 and migration rates m = 2.0, respec- 
tively. We ran the Markov-chain Monte Carlo (MCMC) 
simulations with 1,000,000 burn-in steps and 20,000 gen- 
ealogies sampled per locus. The analysis was done with 
10 independent runs in the M mode, each using identical 
priors and 20 Metropolis-Coupled chains with different 
random number seeds. The genealogies sampled from 
the M mode runs were combined in an L mode run to 
build an estimate of the joint posterior probability of the 
parameters [26,27]. 

Results 

Nuclear gene trees with allelic haplotype sequences 

Amplification of the partial ncpG5' and SBP genes pro- 
duced a single clear band for each amplification. This 
and the results from earlier work (see "Methods") sug- 
gest sequences of the two nuclear loci obtained here 
each as belonging to a set of orthologs. 

After eliminating some sequences likely containing PCR- 
recombination (about 10% of the total), 303 sequences 
(clones) of the ncpG5' and 313 of the SBP gene from 
broadly the same 70 individuals of 29 populations belong- 
ing to A. millefolium agg. were used for the data analyses 
(Table 1). In addition, 56 ncpG5' and 36 SBP sequences 
from five populations of three congeneric species outside 
the A, millefolium agg.. A, nobiliS'2x, A. ligustica'2x and 
A. acuminata'2x, were also analyzed here. The ncpGS 
alignment contains 918 nucleotide positions with sequence 
length varying from 795 to 861 bps. The SBP alignment 
contains 420 nucleotide positions with sequence length 
varying from 386 to 405 bps. 

Prior to the analyses of all the diploid and polyploid 
taxa, we present the gene trees at the diploid level first 
(Figures 1 & 2). 

The diploid-only ncpG5' data come from seven species 
within and three outside A. millefolium agg.. They contain 
31 haplotypes with 134 substitution sites from 186 
sequences (Additional file 1: S-Figure 1). Out of the 134 
polymorphic sites, 122 are in introns. Intragenic recombi- 
nation among some samples is likely: A discordance in the 
alignment can be resolved by postulating a recombination 
in three of the four ncpG5' haplotypes of A. nobiliS'2x 
around the 89^^ polymorphic site (Additional file 1: S-Fig- 
ure 1). Another discordance can be resolved in the two 
haplotypes of A. cuspidata'2x (III in Figure lA) by postu- 
lating a recombination around the 26**^ polymorphic site 
between haplotype groups of A, millefolium agg. and of A, 
ligustica'2x (Additional file 1: S-Figure 1). 

Figure lA shows an unrooted Neighbour Joining phylo- 
gram based on the diploid-only ncpG5' data. The topol- 
ogy of the MP tree on the same data set is broadly 
comparable, and thus only bootstrap values from the MP 
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13=asi (AL1-2/7) 
14=asi (ARX-2/8) 
15=asp (NS2-1/3) 
16=asp (BZ-1/1;Ta-1/1); 

cer (10240-2/10) 
17=asp (BZ-1/2) 
18=asi (ARX-1/1) 
19=asi (NM-3/5) 



24=set(K4-1/1) 
25=set (K4-2/5) 
26=set(K4-1/1;SeAA-3/8) 
27=set (SeAA-2/2) 
28=set (GR-1/3;NS1-2/8) 



1=ros(Si6-1/1) 
2=ros(Si3-1/5; Si6-1/2) 
3=ros(Si6-1/1) 
4=ros (V-1/2) 
5=asp (BZ-2/6) 

6=asp(NS2-2/6;Ta-3/10);ros (V-1/3) 
7=asi(AL1-1/2;ARX-1/4; NM-2/3) 
8=lat (Geo-3/12) 
9=nob (Zn-1/5) 



Figure 1 The ncpGS gene tree and the AFLP tree. A. Unrooted Neighbour Joining pliylogram of 10 diploid Achillea species (seven of and 
tliree outside A. millefolium agg.) based on tine ncpGS gene sequence data. The tree contains 31 ncpGS haplotypes generated from 186 clones 
(sequences) from 49 individuals of 20 populations. Bootstrap supports (> 50%) from both methods (NJ/MP) are shown next to the major 
branches. Labels of terminal branches are written as "taxon abbreviation (population code-number of individuals/number of clones)". For taxa 
abbreviations, see Table 1. B. The AFLP tree from a previous study (Guo et oL, 2005 [10]) for comparison. 
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1=set(NS1-1/1; SeAA-2/8) 
2=set(NS1-1/2) 
3=asp(BZ-2/3;NS2-1/1) 
4=asp (NS2-2/7) 
5=asp(BZ-1/1;NS2-1/2); 

cer (10240-2/2); 

ros (Si6-1/5); 

set(K4-1/5; NS1-1/3) 
6=asp(BZ-1/1;NS2-1/5); 

cer (10240-1/2) 



97/99 




Figure 2 Unrooted Neighbour Joining phylogram of 10 diploid Achillea species (seven of and three outside A. millefolium agg.) based 
on the SBP gene sequence data. The tree contains 21 SBP haplotypes generated from 163 clones (sequences) from 35 individuals of 19 
populations. Bootstrap supports (> 50%) from both methods (NJ/MP) are shown next to the major branches. Labels of terminal branches are 
written as "taxon abbreviation (population code-number of individuals/number of clones)". For taxa abbreviations, see Table 1. 




7=asp(BZ-2/2;Ta-2/10); 

cer (10240-1/2); ros(Si3-1/5) 
8=asi(AL1-1/3); 

set(GR-1/5; K4-1/5;NS1-1/1; SeAA-1/2) 
9=asi (ARX-2/2; NM-1/1);asp (BZ-1/1) 
10=asi(AL1-1/2);cer(10240-1/3) 



11=lig (SN-1/4) 
12=lig (SN-1/1) 



13=acu (ARX2-2/11;CB1-2/6;TB5-1/2) 
14=acu (CB1-1/2; TB5-2/5) 
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analysis are presented in Figure lA. In this gene tree, 
haplotypes of A. millefolium agg. fall into two major 
groups, group I corresponding to A, setacea-l^, and 
group II those of the other Eurasian diploid species 
except A. cuspidata-ly.. This relationship agrees with that 
from the previous AFLP analysis (Figure IB inferred from 
Guo et aL, 2005). In haplotype group II, both the sub- 
groups Ila and lib harbor sequences of A. asplenifolia'2x 
and A. asiatiaca'2x. This is not in line with the AFLP 
tree and is most likely due to retention of ancestral poly- 
morphism or secondary contacts between the two spe- 
cies. A. asplenifolia'2x shares its diverged alleles with 
A. ceretanica'2x and A. roseoalba'2x, respectively, corre- 
sponding to the previous AFLP tree (Figure IB). 
Sequences of A. ceretanica'2x (relict in the Eastern Pyre- 
nees) only appear in lib and those of A, roseoalba'2x 
(from the meadows in the S-Alps) only in Ila. The 
sequences of A, cuspidata'2x (in the Himalayas) appear 
distant to other members of A. millefolium agg. but close 
to A. ligustica'2x (Figure lA). The single plant sample of 
A. cuspidata'2x was collected recently and thus was not 
included in the previous AFLP study. Clearly more sam- 
ples of this species should be investigated. Out of the 
four haplotypes of A. nobiliS'2x, one falls into clade Ila 
together with several members of A. millefolium agg., 
and three form a group between Ila and lib. The latter 
can be explained by the intragenic recombination visible 
in the alignment (Additional file 1: S-Figure 1). Consider- 
ing relationships of A, nobilis with A, millefolium agg., the 
ncpG<S gene tree is neither congruent with the AFLP tree 
(Figure IB) nor with the SBP tree (see below. Figure 2). 

The diploid-only SBP data also include seven species 
within and three outside A, millefolium agg. They contain 
21 haplotypes with 60 substitution sites based on 163 
sequences. Out of the 60 polymorphic sites, 48 belong to 
the intron regions. The alignment of the SBP sequences 
does not show obvious intragenic recombination (Addi- 
tional file 2: S-Figure 2). The topology of the NJ tree is 
broadly comparable with that of the MP tree and thus 
only bootstrap values from the MP analysis are presented 
(Figure 2). The SBP gene tree (Figure 2) is remarkably 
incongruent with the ncpG5' tree (Figure lA) and with the 
AFLP tree (Figure IB). In Figure 2, haplotypes belonging 
to members of A, millefolium agg. do not group together. 
Some of A, asiatica'2x, all of the Caucasus A, latiloba'2x 
and the Himalayan A. cuspidata'2x (here defined as Asian 
types) are distantly related to the others of A, millefolium 
agg. Surprisingly, sequences of the C European A, nobilis- 
2x are close to the Asian type, whereas, haplotypes of the 
E Asian A. acuminata'2x are close to the major haplotype 
group of A, millefolium agg., which is mostly of the 
European members. We thus observe little sorting of 
ancestral polymorphisms of the SBP gene during the spe- 
ciation processes of the diploid species oi Achillea, 



Figure 3 shows the ncpG5' gene tree (unrooted NJ tree) 
of all the diploid and polyploid taxa of A, millefolium agg. 
and of three congeneric diploid speices. It is based on 110 
haplotypes generated from 359 sequences with 155 poly- 
morphic substitution sites. In Figure 3, alleles of each poly- 
ploid individual, population or taxon (marked in different 
colors) scatter among haplotypes of different diploid spe- 
cies (all in black letters) of A. millefolium agg. except 
A. cuspidata'2x, A. roseoalba-^x and A. asiatica-^x share 
some of their alleles with their diploid cytotypes, respec- 
tively, whereas A, ceretanica-^x is quite differentiated 
from A. ceretanica'2x. Alleles from the tetraploid A. bor- 
ealis var. alpicola and var. lanulosa-^x in N America, the 
tetraploid A. asiatica-^x in C Asia and the hexaploid 
A. millefolium subsp. apiculata-Gx in NE Europe are more 
often associated with each other than with those of other 
polyploid taxa. Only the Ukrainian A. inundata-^x and 
the C Asian A, schmacovii-Gx share nuclear haplotypes 
with A. setacea'2x. 

The date set of SBP gene of all the diploids and poly- 
ploids contains 68 haplotypes generated from 349 
sequences with 68 polymorphic substitution sites. Due to 
the severe conflicts between the SBP gene tree and the 
species tree inferred from the AFLP and morphological 
data, sequences of this gene are not suitable for the phylo- 
genetic inference, but could provide some clues about the 
progenitors of the polyploid taxa. We therefore only pre- 
sent the SBP gene tree of all the diploid and polyploid 
samples in the supplementary materials as Additional 
file 3: S-Figure 3. 

Phylogenetic networks based on plastid haplotypes 

Thirty populations with broadly the same 70 individuals 
analyzed with the nuclear genes were sequenced at 
three plastid loci, trnH-psbA, trnC-ycb6 and rpll6. The 
length variation and number of polymorphic sites of 
each fragment are listed in Table 2. 

For the diploid members of A. millefolium agg. together 
with their sister species A, ligustica, three plastid frag- 
ments from 18 populations and 34 individuals generated a 
combined matrix with 1855 (varying from 1814 to 1842) 
nucleotide positions and 26 variable sites. Out of the 26 
variable sites, 18 are substitution sites and 8 are indels 
(Table 2). The polymorphic sites allow the identification 
of 13 plastid haplotypes named as dHl-13, where "d" 
stands for diploids to be distinguished from those used for 
the diploid- polyploid combined data as described below. 
As shown in Figure 4, polymorphic plastid haplotypes are 
found within each of the three relatively widespread spe- 
cies A. setacea'2x, A, asplenifolia'2x and A, asiatica'2x. 
Furthermore, distribution of the plastid polymorphism is 
not even among the diploid species. Among the three 
widespread species, the European A. setacea and A. asple- 
nifolia each harbours a relatively frequent haplotype, dH7 
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alp (US2-1/1); lan(US5-l/l) 
alp (US2-1/1) 

alp (US2-1/4); Ian (US5-1/1) 
alp (US2-1/1) 
alp (US2-1/3) 

ros (V-1/2); api (Rb-1/1); Ian (US5-1/1) 

1) 

lan(US5-l/l) 
api (Rc-1/1) 
lan(US6-l/l) 
inu (K13-1/1) 
lan(US6-l/l) 
lan(US6-l/l) 

api (Rc-1/1); Ian (US5-1/1; US6-1/5) 
api (Rc-1/1) 
ros-4x (V-1/1) 
sch (AL5-1/1) 

I asp (BZ-2/6); api (Ra-1/2); asi-4x (AL9563-2/4); 

I inu (K13-1/2); sud (STms-1/3 );ros-4x (V-1/1); sch (AL5-2/3) 

inu(K13-l/l) 

ros-4x (V-1/1) 

ros (Si6-1/1) 

ros (Si3-l/5; Si6-l/2); ros-4x (Si6-l/3) 

ros (Si6-1/1) 

inu(K13-l/l) 



asi (ALl-1/2; ARX-1/4; NM-2/3); asi-4x(AL3-l/3; AL9563-1/2) 
ros-4x (Si6-1/1) 
cer-4x (10222-1/1) 
cer-4x (10222-1/1) 
api (Ra-l/3;Rc-l/2) 

api (Rb-1/5); cer-4x (10222-2/7); sud (STms-l/3);ros-4x (Si3-l/2) 
ros-4x (V-1/1) 

asp (NS2-2/6; Ta-3/10); ros (V-1/3); ros-4x (V-1/2) 



dis (DIKF-1/1) 

asi-4x (AL3-1/2); dis (DIKF-1/2) 
dis (DIKF-1/1) 
asi-4x (AL3-1/1) 
alp (US2-1/1) 

1/1) 

asi (ALl-2/7); asi-4x (AL9563-1/3) 
3-1/5) 

asi (ARX-2/8); asi-4x (AL9563-1/2) 
_.9563-l/l) 
. . .L9563-1/1) 
api (Ra-1/1) 
api (Ra-1/1) 
lan(US6-l/l) 
lan(US6-l/l) 

asp (BZ-1/1; Ta-1/1); ca- (10240-2/10); ros-4x (Si3-1/1) 

asp (NS2-1/3) 

api (Ra-1/1) 

api (Ra-1/1) 

api (Ra-1/1) 

asi(ARX-l/l) 

asp (BZ-1/2) 

api (Ra-1/1; Rc-1/1) 

asi (NM-3/5); api (Ra-1/1); Ian (US6-1/1); dis (DlKF-1/3) 
api (Rc-1/1) 
ros-4x (V-l/I) 

sud(STms-l/l) 




dis (DIKF-1/1) 

dis (DIKF-2/2) 
alp (US2-1/1) 
sch(AL5-l/l) 
sch(AL5-l/l) 
nob (ZN-1/5) 

api (Ra-1/2); Ian (US5-1/4; US6-1/3); asi-4x (AL3-1/1) 

Ian (US5-1/1) 
lan(US5-l/l) 
1) 

lat (Geo-3/12) 
scli(AL5-l/l) 
sch(AL5-l/l) 
cer-4x (10222-1/1) 

1/1) 

ros-4x (V-1/2) 
api (Ra-1/1) 
api (Ra-1/1) 

api (Ra-1/2); lan(US5-l/l) 



ros-4x (V-1/1) 

r-^ inu(K13-l/l) 

\ api (Ra-1/1) 

81/771 lan(US6-l/l) 

, nob(ZN-l/l) 

54;-i nob(ZN-2/9) 

1 nob(ZN-3/9) 

set (K4-2/5) 
set (K4-1/1; SeAA-3/8) 
set (K4-1/1) 

set (GR-1/3; NSl-2^); inu (K13-1/2) 
set (SeAA-2/2) 
inu (K13-1/1) 
sch(AL5-l/l) 
cus (ID- 1/5) 
cus (ID- 1/2) 
lig (SN-1/2) 
lig (SN-1/1) 

acu (ARX2-3/12, CBl-3/8) 
acu (TB5-1/2) 
acu (TB5-3/7) 

Figure 3 Unrooted Neighbour Joining cladogram of the ncpGS gene of all the diploid and polyploid taxa analyzed in this study. The 

tree contains 109 allelic haplotypes generated from 359 sequences with 155 substitution sites. Topology of the MP tree on the same data set is 
broadly comparable with that of the NJ tree. Bootstrap supports (> 50%) from NJ/MP analyses are shown next to the branches. Label of each 
terminal branch is written as "taxa abbreviation (population code-number of individuals/number of clones)". For taxa abbreviations, see Table 1. 
Diploid taxa are in black, polyploid taxa in different colours. 
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and dH13, respectively, whereas A. asiatica'2x in Asia 
exhibits 5 haplotypes. 

To illustrate the formation of the polyploids and their 
worldwide migration, we incorporated the plastid trnH- 
psbA and trnC-ycf6 sequences obtained by Ramsey et aL 
[12] from the 4x and 6x N American A. borealis popula- 
tions available from the NCBI data base (accession No. 
EU128982- 129456) into our diploid-polyploid combined 
data. Our rpL16 intron sequences were left out because 
this gene was not sequenced for the populations ana- 
lyzed by Ramsey et al, [12]. The resulting data matrix 
thus contains 156 individuals and 1024 nucleotide posi- 
tions (sequences varying from 960 to 989 bps in length). 
This allows identification of 26 haplotypes (Hl-26) on 
the basis of 29 variable characters, of which 21 are 
nucleotide substitutions and 8 are indels (Table 2). Rela- 
tionships among these haplotypes are shown in Figure 5. 

HI is the most frequent haplotype in Eurasia. It is 
shared by three out of four European diploid species and 
is spread among most of the Eurasian polyploid taxa at all 
ploidy levels (Figure 5). The rare Hll in Europe (only in A 
asplemfolia'2x) is related to the Eurasian H5, to most of 
the Asian types H 6-9 &15, and even to H14 which is the 
most frequent in N America. The European polyploid spe- 
cific H3 (in A. millefolium subsp. apiculata-^x) and H13 
{A. millefolium subsp. sudetica-Gx &A. inundata-^x) are 
directly or indirectly related to H4 found in A, asiatica-^x 
& -4x. The N American frequent H14 has been derived 
most probably from the E Asian HIO and evidently gave 
rise within N America to several rare and more local hap- 
lotypes (H18-22, 24, 26). 

The 26 plastid haplotypes are mapped associated with 
the general distribution of populations studied here in 
the Additional file 4: S-Figure 4. 

Demography of major diploid lineages 

Figure 6 shows the plots of the joint posterior probability 
distributions of each of the demographic parameters. 
Panel A of Figure 6 indicates sharp peaks close to zero for 
the effective population sizes of the three species after 
their splits, i.e., (of A. asiatica-2x), ql (of A. asplenifo- 
lia'2x) and q2 (of A. setacea'2x). The speciation times are 
rather short (panel B), especially tO, i.e., the split between 



A. asiatica-2x and A. asplenifolia'2x, is inferred to have 
occurred very recently. The speciation time tl is bounded 
away from zero, which corroborates our assumption of 
A. setacea splitting first. We note that these inferred para- 
meters are consistent with the data. While the effective 
population size qO is relatively small, ^0 is very short, such 
that their ratio qOltO is still relatively large, which corre- 
sponds to relatively little drift and thus the relatively large 
diversity in A. asiatica-2x compared to the other two spe- 
cies. Drift is higher in the other species: The effective 
population size within A, asplenifolia'2x, i.e., ^1, is esti- 
mated to be even smaller than qO of A, asiatica-2x, which 
combines with ^0 to explain the reduced diversity in this 
species compared to A. asiatica-2x. The estimate of effec- 
tive population size in A. setacea'2x, q2, is about the same 
as qO in A, asiatica-2x, however the time since the split, i. 
e., t\ is longer, such that the product ^2/^1 is smaller than 
^0/^0 in A. asiatica-2x. Note that the combination of short 
times and small effective population sizes means that the 
species are mainly differentiated by drift and not by 
mutations. 

In contrast, all ancestral population sizes are large 
compared to the current ones: cp>, the ancestral effective 
population size before the split of A. asiatica'2x and 
A. asplenifolia'2x, and q^, the ancestral population com- 
mon to all three species are bounded away from zero. 
The distributions of the migration parameters (panel C) 
are broad and quite similar to the equal prior distribu- 
tions, such that we conclude that there is little informa- 
tion for inference of these parameters in the data. 
Setting wider maximum priors for m did not result in 
convergence. We note that differentiation of subdivided 
populations that exchange few migrants and populations 
that split and afterwards do not exchange migrants lead 
to rather similar molecular variation patterns. Therefore, 
differentiating between migration and drift and temporal 
population subdivision is difficult. This likely explains 
the broad posterior distribution of the inferred migra- 
tion parameters. 

Discussion 

Within the Achillea millefolium aggregate, diploid spe- 
cies are usually well separated and their relationships 



Table 2 Sequence characters of the analyzed cpDNA non-coding regions 





trnH- 
psbA 


tmC-ycf6 (incl. 
partial ycf6-psbM) 


rpU6 (for diploid 
taxa only) 


All three fragments combined 
for diploid taxa only 


trnH-psbA and tmC-ycf6 
combined for all taxa 


Length 


424- 
445 bp 


537-564 bp 


849-876 bp 


1814-1842 bp 


960-989 bp 


Number of haplotypes 


13 


18 


8 


13 


26 


Number of variable sites 


11 


18 


9 


26 


29 


Number of indels (length 
in base pairs) * 


3 (5; 1; 
21) 


5 (1; 6; 1; 21) 


4 (5; 5; 1; 22) 


8 (1; 6; 5; 1; 5; 5; 1; 22) 


8 (1; 6; 1; 6; 21; 5; 1; 21) 



^ Indels were coded as binary characters (0/1 = A/C) using software GapCoder [37] 
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A. roseoalba-2x 




dH2 



latiloba-2x 
dH9 



outgroup: 
^^^^A. ligustica-2x 



I A. asiatica 
I A. ceretanica 
I A. asplenifolia 
A. setacea 



dHll 



dH7 




dH12 



dH5 



dH6 



dHl 
dH2 
dH3 
dH4 
dH5 
dH6 
dH7 

dH8: 
dH9: 
dHlO 
dHll 
dH12 
dH13 



asi(ALl-l) 
asi(ALl-2, 3) 
asi(ARX-l; N 
asi(ARX-3) 
asi(NM-l, 3) 
asp(BZ-2, 6) 
asp(NS2-6, 7, 11; Ta- 
cer (10240-10, 15) 
cus(ID-l) 
lat (Geo-1, 2, 3) 
lig(SN-lO) 
ros(Si3-l; Si6-2 
set (GR) 

set(K4-2, 3; NSl 



tiH9. 



2,4; SHB-1) 



-3,4) 



V-5) 

4, 8; SeAA-8, 10) 



Figure 4 Medium-Joining Network (A) and unrooted NJ tree (B) of 13 cpDNA haplotypes (dHl -13) across 34 individuals of 18 
populations of the diploid species of A millefolium agg. and the congeneric A ligustica-2x. These haplotypes are generated from three 
noncoding cpDNA regions, trnti-psbA, trnC-ycfd and rpL16. Short bars on branches of the network indicate the nurmber of variable sites (incl. gap 
polymorphisms, see Table 2). Red numbers next to the branches of the NJ tree are bootstrap values. Plant individuals are labelled as "taxa 
abbreviation (population code-individual identity No.)". For taxa abbreviations, see Table 1. 



conform to a binary bifurcating tree according to the 
AFLP data [10]. At the polyploid level, species are diffi- 
cult to define either by morphology or by molecular 
data. Previous AFLP data show the polyploid taxa 
mostly polytomic and polyphyletic [10,11]. With the 
haplotype sequence data from both nuclear and plastid 
genomes available here, we try to infer population his- 
tory and demography of the diploid species and to 
untangle the complex relationships among the 
polyploids. 

Gene trees versus species trees of the diploids 

The gene trees from the nuclear and the plastid loci of the 
diploid populations of A. millefolium agg. are incongruent 
with each other and with the previous AFLP species tree. 
None of the gene trees corresponds well with the morpho- 
logical and cytogenetical differentiation of the diploid spe- 
cies, whereas the AFLP tree does (Figures 1, 2 & 4; [10]). 



For the ncpG<S gene, haplotypes of each of the widespread 
A, asplenifolia'2x and A. asiatica'2x belong to two groups, 
Ila and lib (Figure lA). Relevant ncpG5' sequences indicate 
intragenic recombination (Additional file 1: S-Figure 1). 
The SBP gene tree conflicts more severely with the AFLP 
tree and the species delimitation (Figure 2). Even the plas- 
tid sequences show polymorphic haplotypes within each 
of the three widespread species A, setacea'2x, A. asplenifo- 
lia'2x and A, asiatica'2x (Figure 4). As the phylogenetic 
relationships of these three species indicated by the AFLP 
tree (Figure IB) are in line with the morphological and 
biogeographical information, we suggest that the gene tree 
incongruence as well as their discordance with the inferred 
species tree (asserted by the AFLP tree) are due to a lack 
of sorting of ancestral polymorphic alleles and/or due to 
introgression after the split of the species. Assuming neu- 
trality, retention of ancestral polymorphism is likely if spe- 
ciation is fast relative to drift within the populations [41]. 
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H5: set(K4-2,3; NSI-4,8; SeAA-8,10); 
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H23: bor-lan (Washington: Deception Pass-1, 2; 
Horsethief-1); bor-lit (British Cloumbia: Rathtrevor 
Beach-2) 
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(HI 8-26 and partly H 14 are cited from Ramsey etal., 2008) 



Figure 5 Medium-Joining Network (A) and unrooted NJ tree (B) of 26 cpDNA haplotypes (HI -26) across the N Hemisphere 
populations of the diploid and polyploid taxa of A millefolium agg. The data are based on sequences of two cpDNA noncoding regions, 
trnH-psbA and trnC-ycfd. Short bars on branches of the network indicate the number of variable sites (incl. gap polymorphisms, see Table 2). Red 
numbers next to the branches of the NJ tree are bootstrap values. Plant individuals are labelled as "taxa abbreviation (population code-individual 
identity No.)". For taxa abbreviations, see Table 1. 



The quantitative results from the IM model (Ima2 pro- 
gram [27]) show large ancestral effective population sizes, 
short splitting time between them and some migration. 
This corroborates our inference from the discordant 
nuclear gene trees and suggests rapid speciation and/or 
occasional exchange of migrants at the diploid level within 
A. millefolium agg.. 

The same pattern apparently also characterizes the more 
distantly related congeneric species, e.g., A. nobiliS'2x, A. 
ligustica'2x and A. acuminata'2x (Figures 1 & 2). In A. 
nobilis, a diploid species relatively close to A, millefolim 
agg., we find intra-locus recombination in the ncpG5' gene, 
indicating its hybrid origin or introgression involving a 
diploid donor from A, millefolim agg. (Additional file 1: S- 
Figure 1; Figure lA). 

Phylogeography and rapid speciation 

Throughout the N Hemisphere, gene flow between 
divergent lineages through periods of climate changes 



has shaped the extant geographical distribution and pat- 
terns of genetic variation of many plant species [46-51]. 
Rapid speciation resulted from post-glacial hybrid con- 
tacts and polyploidy during population migration have 
often been reported and documented. For example, 
Brysting et al [34,52] has untangled the complex history 
of the polyploid Cerastium alpinum group in the cir- 
cumpolar arctic flora. Complexity is also evident in the 
evolutionary radiation of A millefolium agg [10-12]. 

The diploid taxa of A, millefolium agg. are limited to 
Eurasia, following an eco-geographical vicariance pattern 
but appear to be disjunctive and regressive evidently 
under the pressure of their more expansive polyploid 
descendants: A, ceretanica-^x, a relict endemic in subal- 
pine grassland of the E Pyrenees; A. asplenifolia, an 
endemic relict of humid lowland grasslands in the Pan- 
nonian plains from Bulgaria, Hungaria, E Austria and 
the adjacent Czech Rep.; A, roseoalba'2x, a variable 
taxon of mesic forest margins and anthropogenous 
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Figure 6 Marginal posterior probability density for each of the population size {0, written as q in panel A), splitting time (t) and 
migration rates [m) parameters under a multi-population model [Mai) for three major diploid species A asiatica-2x, A. asplenifolia-2x 
and A. setacea-2x. In panel A & C, the number "0" stands for A. asiatica-2x, "1" for A. asplenifolia-2x, "2" for A. setacea-2x, "3" for the common 
ancestral population of A osiatico-2x and A. osplenifolio-2x, and "4" for the ancestral population common to all three species. In panel B, tO refers 
to the splitting time of A. asiotica-2x and A. asplenifolio-2x, and tl represents the time when A. setacea-2x split from the ancestor of A. 
osplenifolio-2x and A. osiotico-2x. In panel C, " > " indicates direction of migrants, e.g., mO > 1 refers to migration from 0 {A. asiotico) to 1 {A. 
asplenifolia). Each curve is the sum of 10 curves from the analysis of 10 independent MCMC simulations. 
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meadows in the geologically recent N Italian plains and 
foothills with populations in adjacent Switzerland, 
Austria, and locally even in Bavaria and Slovenia where 
it is in close contact with A. asplenifolia'2x; A. setacea 
in steppes from SW Asia and SE Europe to continental 
areas of C Europe and the Alps; A. latiloba, today a sub- 
alpine relict in the mountains of NE Turkey and SW 
Georgia; A. asiatica-^y. in montane to alpine grassland 
from the Altai to Mongolia and N China; and A. cuspi- 
data'2y., a relict in the W Himalayas. 

The previous AFLP data [10] indicate that A, setacea- 
2x branches early within the A, millefolium agg. The pre- 
sent sequence data partly show compatible patterns of 
relationships, i.e., the European species A. ceretanica'2y., 
A. roseoalba'2x and A. asplenifolia'2x are more closely 
related to each other and to the Asian A. asiatica'2x 
than to A. setacea'2x (Figures 1 & 4). The morphology, 
distribution and habitat preferences suggest that 
A. roseoalba'2x might have originated from introgression 
into A. asplenifolia possibly by the A. ceretanica-like 
populations [15,53]. Compared to its European sister spe- 
cies, A. asiatica'2x represents a wide geographic exten- 
sion of A. millefolium agg. into C and E Asia and harbors 
the richest plastid variation among all the diploid taxa 
(Figure 4). It could have survived the cold periods of the 
late Pleistocene in refugia not far from its present occur- 
rence in or near the Asian mountain areas. In spite of the 
large distributional range, we find little evidence for isola- 
tion by distance: The easternmost A, asiatica'2x and the 
westernmost A. ceretanica'2x share as much genetic 
variation as each with the geographically intermediate 
A, asplenifolia'2x and A, roseoalba'2x (Figures 1 & 2). 

In contrast to the diploid species that show phyloge- 
netic structuring and are limited to Eurasia, the poly- 
ploid taxa of A. millefolium agg. exhibit practically 
continuous and interrelated relationships and have 
extended their range to N America. With the available 
nuclear ncpG5' and SPB sequence data alone, it is often 
hard to decide whether a polyploid taxon studied here is 
auto- or allopolyploid because even the diploids share 
polymorphic alleles likely due to incomplete lineage 
sorting and/or secondary introgression. Only combined 
with the previous AFLP profiles [10], the autopolyploid 
nature of some may become clear, but most are influ- 
enced by hybridization. 

In Europe, A. ceretanica-^x in central France exhibits 
some ncpG5' correspondence (Figure 3) with A, roseoalba- 
4x and AFLP affinities with A, ceretanica'2x [10] but 
otherwise little affiliation with its diploid cytotype. In 
populations of A. roseoalba'2x, 4x- individuals occur with 
a new plastid haplotype (H16 in Figure 5) and correspond- 
ing ncpG5' alleles (Figure 3). A, styriaca-^x, an endemic 
from E Austria and the Czech Rep., is ecologically distinct 
but shares ncpG5' and SBP alleles with A. roseoalba-^x 



and the plastid haplotype HI with A. asplenifolia'2x and 
so on. A. setacea'2x and A, asplenifolia'2x have been 
involved in the origin of the widespread and expansive C 
to E European allotetraploid A. collinaAx [10,19]. All the 
4x-taxa mentioned above are connected by occasional 
intermediates, can be easily hybridized in crossing experi- 
ments and produce viable progeny with more or less nor- 
mal meioses [54,55]. Further to the east in Ukraine, 
A. inundata-^x is linked to A. setacea-2x and also to 
A. asplenifolia'2x etc, (Figures 3 & 5). 

The diverse 4x-cytotypes from C and E Asia are provi- 
sionally called A, asiatica-^x. They are particularly linked 
to A, asiatica'2x, but also to A, cuspidata-2x (Figure 5; 
Additional file 3: S-Figure 3). Plastid HI and H5 also 
demonstrate links of A. asiaticaAx with the European 2x- 
and 4x-taxa (Figure 5). How complex the relationships of 
higher polyploids in A. millefolium agg. are is shown by 
the recently described Altai endemic A. schmakovii-Gx: it 
shares nuclear haplotypes with A. setacea'2x, A. asplenifo- 
lia'2x, A. inundata-^x, A. asiatica-^x and has the rare 
plastid haplotype H7 from A. asiatica'2x. 

The Pleistocene extension of A. millefolium agg. into N 
America [12] with populations called A. borealis s.lat. and 
two main cyto types (4x and 6x [6]) is of particular interest. 
They have developed ecotypes in most habitats from the 
sand dunes of the Pacific to the peaks of Rocky Mts. and 
the East Coast forests. Their rapid adaptive differentiation 
has been well documented by Clausen et al, [6] and Ram- 
sey et al, [12], The molecular genetic data from Guo et al, 
[10,11] and Ramsey et al, [12] have shown that all these 
native N American populations differ from those in Eura- 
sia, are monophyletic and most likely linked to A, asiatica- 
2x1 ^x (called A. millefolium var. manshurica Kit. [12]), but 
can not be resolved in more details. In the ncpG5' and SBP 
gene trees (Figure 3; Additional file 3: S-Figure 3), they 
share alleles with A, asiatica'2x and -4x, but also with 
A, millefolium-Gx [A, apiculata) in subarctic Russia and 
even with A, inundataAx, A, ceretanicaAx and A, aspleni- 
folia-2x. Considering plastid haplotypes, the most frequent 
H14 is likely the ancestral, from which H18-H22, 24, 26 
have originated (Figure 5). All these and additional fossil 
data [12,56,57] support the assumption that the ancestors 
of the N American populations might have survived in NE 
Asian/ Alaskan refugia ("Beringia") during the middle or 
late Pleistocene cold periods [12,49]. Apart from the west- 
ward extension through Siberia to subarctic Europe 
[A, millefolium-Gx/A, apiculata-Gx), their main migrations 
might have been east- and southward into N America. 
There, without the competition of closely related taxa, 
they underwent a radiation and formed the 4x and 6x 
racial complex of A, borealis s.lat.. This can be regarded 
quasi as a model for the early phase of the eco-geographi- 
cal radiation of A, millefolium agg. in SE Europe and adja- 
cent SW Asia, and a second phase with A, asiatica-2x+^x, 
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A. alpina-Aix and A. schmakovii-Gx in C and E Asia 
[10,11,15,58,59]. 

Conclusions 

Nuclear and plastid haplotype data analyzed in this 
study suggest rapid diploid and polyploid speciation in 
the temperate N Hemisphere Achillea species, especially 
those within A, millefolium agg.. Hybridization and poly- 
ploidy seem to have promoted the recent lineage radia- 
tion, shaped the concurrent patterns of genetic 
variation, and contributed to the wide distribution of 
this species complex. 

The sequence data from two nuclear genes and chloro- 
plast DNA employed in this study result in incongruent 
trees, obviously due to lack of lineage sorting and/or sec- 
ondary hybridization, and thus cannot resolve the species 
phylogeny. This lack of lineage sorting apparently extends 
to other congeneric taxa. To date, the AFLP tree [10] is 
the only molecular tree that can be brought into accor- 
dance with species delimitations, morphology and tradi- 
tional taxonomy of A. millefolium agg.. This is likely due 
to the averaging effect of the genome-wide sampling of 
AFLP polymorphism. 

With little lineage sorting and frequent gene flow, the 
species tree can only be recovered using data from many 
unlinked DNA regions. Despite the development of new 
techniques such as the RAD tag technology [60], tradi- 
tional molecular methods, such as AFLP genome screen- 
ing, are still useful for non-model species, especially if 
complemented by likelihood-based Bayesian analytical 
methods [61]. On the other hand, even if sequence data 
from the nuclear and chloroplast genomes can not over- 
ride the AFLP data in resolving recent species divergence, 
they do help to understand population demography and 
speciation processes, and to demonstrate that shared 
ancestral polymorphisms are more common than fixed 
alleles in young radiating species. Therefore, for the phylo- 
geographic surveys of non-model organisms, we advocate 
the use of DNA polymorphisms from multiple unlinked 
loci, e.g,, AFLP markers, combined with sequence data 
from some single genes, as such a combination appears 
useful and cost and time efficient. 

Additional material 



recombination around tine 89^ poiymorpiiic site between two or tinree 
haplotype groups of A millefolium agg. (one above A nobilis, and the 
others below). The sequences of /\. cuspidate (cus) seems also containing 
a recombination around the 26^^ polymorphic site between those of A 
ligustica (lig_SN) and some of A. millefolium agg. (e.g., several sequences 
at the top of the matrix). 

Additional file 2: The aligned polymorphic sites among 21 SBP 
haplotypes. These haplotypes are generated from 60 substitution sites 
among 163 clones (sequences) from 35 individuals of 19 populations of 
10 diploid Achillea species (seven within and three outside A. millefolium 
agg.). Abbreviations of the species: (1) of A millefolium agg.: asi = A 
asiatica, asp = A. asplenifolia, cer = A. ceretanica, cus = A. cuspidate, lat = 
A. latiloba, ros = A. roseoalba and set = A. setacea; (2) of other species: 
acu = A acuminata, lig = A ligustica and nob = A. nobilis. Title of each 
haplotype sequence includes: abbreviation of species (number of 
populations/individuals/clones). 

Additional file 3: Unrooted Neighbour Joining cladogram of the SBP 
gene of all the diploid and polyploid taxa analyzed in this study. 

The tree contains 68 allelic haplotypes generated from 349 sequences 
with 68 substitution sites. Topology of the MP tree on the same data set 
is broadly comparable with that of the NJ tree. Bootstrap supports (> 
50%) from NJ/MP analyses are shown next to the branches. Label of 
each terminal branch is written as "taxa abbreviation (population code- 
number of individuals/number of clones)". For taxa abbreviations, see 
Table 1. Diploid taxa are in black, polyploid taxa in different colors. 

Additional file 4: Map showing the approximate distribution the 26 
plastid haplotypes (HI -26) recognized across the temperate N 
Hemisphere. Pies indicate the proportion of haplotypes registered for 
individual taxa/cytotypes and the size of each pie correlated to the 
sample size from their generalized sampling areas. 
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